Rounding Semidefinite Programming Hierarchies via 

Global Correlation 

Boaz Barak* Prasad Raghavendra^ David Steurer* 
January 20, 2013 

Abstract 

We show a new way to round vector solutions of semidefinite programming (SDP) 
hierarchies into integral solutions, based on a connection between these hierarchies 
and the spectrum of the input graph. We demonstrate the utility of our method by 
providing a new SDP-hierarchy based algorithm for constraint satisfaction problems 
with 2-variable constraints (2-CSP's). 

More concretely, we show for every 2-CSP instance 3 a rounding algorithm for r 
rounds of the Lasserre SDP hierarchy for 3 that obtains an integral solution that is at 
most e worse than the relaxation's value (normalized to lie in [0, 1]), as long as 

r > k - rankjg(3)/ poly(e) , 

where k is the alphabet size of 3, = poly(e/fc), and rankje(3) denotes the number of 
eigenvalues larger than 6 in the normalized adjacency matrix of the constraint graph of 
3. 

In the case that 3 is a Unique Games instance, the threshold is only a polynomial 
in s, and is independent of the alphabet size. Also in this case, we can give a non-trivial 
bound on the number of rounds for every instance. In particular our result yields an 
SDP-hierarchy based algorithm that matches the performance of the recent subexpo- 
nential algorithm of Arora, Barak and Steurer (FOCS 2010) in the worst case, but runs 
faster on a natural family of instances, thus further restricting the set of possible hard 
instances for Khot's Unique Games Conjecture. 

Our algorithm actually requires less than the n'^'^''' constraints specified by the r''' 
level of the Lasserre hierarchy, and in some cases r rounds of our program can be 
evaluated in time 2"*''^ poly(n). 
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1 Introduction 



This paper is concerned with hierarchies of semi-definite programs (SDP's). Semidef- 
inite programs are an extremely useful tool in algorithms and in particular approxi- 
mation algorithms (e.g., [GW95, KMS98]). SDP's involve finding an integral (say 
0/1) solution for some optimization problem, by using convex programming to find a 
fractional/high-dimensional solution and then rounding it into an integral solution. Sher- 
ali and Adams [SA90], Lovasz and Schrijver [LS91], and, later Lasserre [LasOl], proposed 
systematic ways, known as hierarchies, to make this convex relaxation tighter, thus ensuring 
that the fractional solution is closer to an integral one. These hierarchies are parameterized 
by a number r, called the level or number of rounds of the hierarchy. Given a program on n 
variables, optimizing over the r'^ level of the hierarchy can be done in time The gap 
between integral and fractional solutions decreases with r, and reaches zero at the n''' level, 
where the program is guaranteed to find an optimal integral solution. The paper [Lau03] 
surveys and compares the different hierarchies proposed in the hterature, see also the recent 
survey [CTIO]. 

These semidefinite programming hierarchies have been of some interest in recent years, 
since they provide natural candidate algorithms for many computational problems. In par- 
ticular, whenever the basic semidefinite or linear program provides a suboptimal approx- 
imation factor, it makes sense to ask how many rounds of the hierarchy are required to 
significantly improve upon this factor. Unfortunately, taking advantage of these hierar- 
chies has often been difficult, and while some algorithms (e.g., [ARV09]) can be encap- 
sulated in, say, level 3 or 4 of some hierarchies, there have been relatively few results (e.g. 
[ChlOV, BCC^IO]) that use higher levels to obtain new algorithmic results. In fact, there 
has been more success in showing that high levels of hierarchies do not help for many com- 
putational problems [ABLT06, STT07, GMPT07, RS09, KS09]. In particular for 3SAT and 
several other NP-hard problems, it is known that it takes Q.{n) rounds of the strongest SDP 
hierarchy (i.e., Lasserre) to improve upon the approximation rate achieved by the basic SDP 
(or sometimes even simpler algorithms) [Sch08, Tul09]. 

Semidefinite hierarchies are of particular interest in the case of problems related to 
Khot's Unique Games Conjecture (UGC) [Kho02]. Several works have shown that for a 
wide variety of problems, the UGC implies that (unless P = NP) the basic semidefinite 
program cannot be improved upon by any polynomial-time algorithm [KKMO04, MOO05, 
Rag08]. Thus in particular the UGC predicts that for all these problems, it will take a super- 
constant (and in fact polynomial, under widely believed assumptions) number of hierarchy 
rounds to improve upon the basic SDP. Investigating this prediction, particularly for the 
Unique Games problem itself and other related problems such as Max Cut, Sparsest Cut 
and Small-Set Expansion, has been the focus of several works, and it is known that at 
least (log log «)^^^^ rounds are required for a non-trivial approximation [RS09, KS09] by a 
natural (though not strongest possible) SDP hierarchy. However, no non-trivial upper bound 
was known prior to the current work, and so it was conceivable that these lower bounds can 
be improved to Q.{n). 

Recently, Arora, Barak and Steurer [ABSIO] gave a 2"''°'^*'^' -time algorithm for solving 
the Unique Games and Small-Set Expansion problems (where e is the completeness parame- 
ter, see below). However, their algorithm did not use semidefinite programming hierarchies, 
and so does not immediately imply an upper bound on the number of rounds needed. 
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1.1 Our results 



Our main contribution is a new method to analyze and round SDP hierarchies. We elaborate 
more on our method in Section 2, but its high level description is that uses global correla- 
tions inside the high-dimensional SDP solution, combined with the hierarchy constraints, to 
obtain a better rounding of this solution into an integral one. We believe this method can be 
of general utility, and in particular we use it here to give new algorithms for approximating 
constraint satisfaction problem on two-variable constraints (2-CSP's), that run faster than 
the previously known algorithms for a natural family of instances. To state our results we 
need the notion of a threshold rank. 

Threshold rank of graphs and 2-CSPs. The r-threshold rank of a regular graph G, de- 
noted rank^^(G), is the number of eigenvalues of the normalized adjacency matrix of G that 
are larger than t.' An instance 3 of a Max 2-Csp problem consists of a regular graph Gg, 
known as the constraint graph of 3 over a vertex set [n], where every edge (/, j) in the graph 
is labeled with a relation Ylij c [k] x [k] (k is known as the alphabet size of 3). The value of 
an assignment x e [k]" to the variables of 3, denoted valg(x), is equal to the probability that 
(xj, Xj) € YVi j where (/, j) is a random edge in Gg . The objective value of 3 is the maximum 
valg(x) over all assignments x. We say that 3 is c-satisfiable if 3's objective value is at 
least c. We define rank^^(3) - rank^^CGg). Our main result is the following: 

Theorem 1.1. There is a constant c such that for every e > 0, and every Max 2-Csp instance 
3 with objective value v, alphabet size k the following holds: the objective value sdpopt(3) 
of the r-round Lasserre hierarchy for r k- rank^T(3)/e'^ is within s of the objective value 
V of 3, i.e., sdpopt(3) 4, v + e. Moreover, there exists a polynomial time rounding scheme 
that finds an assignment x satisfying valg(x) > v - e given optimal SDP solution as input. 

Results for Unique Games constraints. We say that a Max 2-Csp instance is a Unique 
Games instance if all the relation Ylij have the form that (a, b) € Yli j iff a = 7Tij(b) where 
Kij is a permutation of [k]. As mentioned above, the performance of SDP hierarchy on 
Unique Games instances and related problems is of particular interest. We obtain somewhat 
stronger quantitative results for Unique Games instances. Also, as remarked below, our 
results are "morally stronger" in this case, since it's conceivable that the hardest instances 
for these types of problems have small threshold rank. First, we show that for Unique 
Games instances the threshold t in Theorem 1 . 1 does not need to depend on the alphabet 
size. Namely, we prove 

Theorem 1.2. There is an algorithm, based on rounding r rounds of the Lasserre hierarchy 
and a constant c, such that for every e > and input Unique Games instance 3 with 
objective value v, alphabet size k, satisfying rank^^(3) < s'^^rjk, where r = e'^, the algorithm 
outputs an assignment x satisfying valg(x) > v — e. 

The Unique Games Conjecture is about a specific approximation regime for Unique 
Games. Given a Unique Games instance with optimal value 1 - e, the goal is to find an 
assignment with value at least 1/2. 

'In this paper we only consider regular undirected graphs, although we allow non-negative weights and/or 
parallel edges. Every such graph can be identified with its normalized adjacency matrix, whose j)''' entry 
is proportional to the weight of the edge j), with all row and column sums equalling one. Similarly, we 
restrict our attentions to 2-CSP's whose constraint graphs are regular However, our definitions and results can 
be appropriately generalized for non-regular graphs and 2-CSPs as well. 
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We also show that in this case a sublinear (and in fact a small root) number of rounds 
suffice to get such an approximation in the worst case, regardless of the instance's threshold 
rank. Moreover, we also show that such an approximation can be obtained in a number of 
rounds that depends on the r-threshold rank for t that is close to 1 (as opposed to the small 
value of T needed for Theorems 1.1 and 1.2). 

Theorem 1.3. There is an algorithm, based on rounding r rounds of the Lasserre hierarchy 
and a constant c, such that for every e > and input Unique Games instance 3 with 
objective value 1 - £ and alphabet size k, satisfying r ^ ck ■ min{?i'''^" , rank^i_c£(3)}, the 
algorithm outputs an assignment x satisfying valg(x) > 1/2. 

Examples of graphs with small threshold rank. Many interesting graph families have 
small T-threshold rank for some small constant r. Random degree d graphs have r-threshold 
equal to 1 for any r > cj Vd. More generally, graphs where small subsets of vertices 
have bounded edge-expansion, referred to as small-set expanders, also have small threshold 
rank. For instance, if every set of size o(n) expands by at least poly{s) in a graph G, then 
ranki_£(G) is at most nP^'^J^""^ [ABSIO]. Generahzing this result, [StelO] showed that if in 
a graph G every set of size o{n) vertices has near-perfect expansion, then it implies upper 
bounds on rank^(G) for t close to 0. 

Also, as noted in [ABSIO], hypercontractive graphs (i.e., graphs whose 2 to 4 operator 
norm is bounded) have at most polylogarithmic r-threshold rank for every constant t > 
0. For several 2-CSP's such as Max Cut, Unique Games, Small-Set Expansion, Sparsest 
Cut, the constraint graphs for the canonical "problematic instances" (i.e., integrality gap 
examples [FS02, KV05, KS09, RS09]) are all hypercontractive, since they are based on 
either the noisy Gaussian graph or noisy Boolean cube. In fact, it is conceivable that the 
Small-Set Expansion problem is trivial on graphs with large threshold rank, in the sense 
that we do not know of any example of an instance having, say, log'^^^^ n 0.99-threshold 
rank, and objective value smaller than 1 /2. (For the Unique Games and Max Cut problems 
it is trivial to construct instances with large threshold rank by taking many disjoint copies 
of the same instance, though it could still be the case that the hardest instances are the 
ones with small threshold rank.) On the other hand, for other 2-CSPs such as Label Cover, 
some natural hard instances have linear threshold-rank. For example this is the case if one 
considers the natural "clause vs. variable" or "clause vs. clause" 2-CSP obtained from 
random instances of 3SAT (which is not surprising given that a non-trivial approximation 
for random 3SAT requires Q.{n) levels of the Lasserre hierarchy [Sch08]). 

Algorithm efficiency. Our algorithm actually does not require the full power of the 
Lasserre hierarchy. First, we can use the relaxed variant with approximate constraints stud- 
ied in [KS09, RS09, KPSIO]. Second, in the proof of Theorem 1.3, we don't need to utihze 
the constraints on all r-sized subsets of n variables, but rather sufficiently many random 
sets suffice. As a result, we can implement our r-round algorithm in time 2^^''^ poly(?i). 

1.2 Related works 

Subspace enumeration algorithms. For Unique Games and related problems, previous 
works [KT07, KollO, ABSIO] used subspace enumeration to give algorithms with similar 
running time to Theorem 1.3 in the case that the threshold rank of the label extended graph 
of the instance is small. This is known to be a stronger condition on the instances than 
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bounding the threshold rank of the constraint graph. The only known bound on the 1 - e 
threshold rank of the label extended graph in terms of the 1 - e threshold rank of the con- 
straint graph loses a factor of about n"^ [ABSIO]. These subspace enumeration algorithms 
also only applied to nearly satisfiable instances (whose objective value is close to 1), and 
so did not give guarantees comparable to Theorems 1.1 and 1.2. As mentioned below in 
Section 2, SDP-based algorithms have some robustness advantages over spectral techniques. 
SDP hierarchies are also easily shown to yield polynomial-time approximation scheme for 
2CSPs whose constraint graphs can have very high threshold rank such as bounded tree 
width graphs and regular planar graphs (or more generally any hyperfinite family of graphs, 
see e.g. [HKNO09] and the references therein). 

Approximation schemes for (pseudo) dense CSP's. For general 2CSP's, several works 
gave polynomial-time approximation schemes for dense and pseudo-dense instances [FK99, 
ACOH^IO, COCFIO]. Our work generalizes these results, since pseudo-density is a 
stronger condition than having a constraint graph of low threshold rank. Furthermore, for 
an e-approximation the degree of the instance needed by these works is exponential in ^, 
while the results of this work apply even on random graphs of degree polyil/s). 

Analyzing SDP hierarchy. Using very different methods, Chlamtac [ChlOV] and 
Bhaskara et al [BCC^ 10] gave LP/SDP-hierarchy based algorithms for hypergraph coloring 
and the densest subgraph problem respectively. As mentioned above, several works gave 
lower bounds for LP/SDP hierarchies. In particular [RS09, KS09] showed that approxima- 
tion such as those achieved in Theorem 1.3 for Unique Games problem require log log"^'^ n 
rounds of a relaxed variant of the Lasserre hierarchy. This relaxed variant captures our hi- 
erarchy as well. Schoenebeck [Sch08] proved that achieving a non-trivial approximation 
for 3SAT on random instances requires Q.{n) rounds in the Lasserre hierarchy, while Tul- 
siani [Tul09] showed that Lasserre lower bounds are preserved under common types of 
NP-hardness reductions. 

In a concurrent and independent work, Guruswami and Sinop [GSll] gave results very 
similar to ours. They also use the Lasserre hierarchy to get an approximation scheme with 
similar performance to our Theorem 1.1 for 2-CSPs, and in fact even consider generaliza- 
tions involving additional (approximate) global linear constraints. They also get essentially 
the same results for Unique Games as our Theorem 1.3. Furthermore, their rounding algo- 
rithm is the same as ours. However, there are some differences both in results and the proof. 
First, although [GSll] use a notion similar to our local-to-global correlation, they view it 
differently, and interestingly relate it to the problem of column selection for low rank ap- 
proximations of matrices. Also, apart from the special case of unique constraints, they work 
with the threshold rank of the label extended graph, as opposed to the constraint graph as 
is the case here (however for binary alphabet these two graphs coincide). Their analysis 
relies on the full power of the Lasserre hierarchy, whereas we show that a weaker hierarchy 
is sufficient in the Unique Games case, and can even be done faster (i.e., exp(r)poly(?i) vs 
nOC-)). 

2 Our techniques 

We now describe, on a very high and imprecise level, the ideas behind our rounding algo- 
rithm and its analysis. A semidefinite programming relaxation of an optimization problem 
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yields a set of vectors vi,...,Vn satisfying certain conditions and achieving some objective 
value c. The goal of a rounding algorithm is to transform this set of vectors into, say, a 
+ 1/ - 1 solution, satisfying the same conditions and achieving value c' that is close to c. 
At a very high level, our main result is that if these vectors have some non-trivial global 
correlation, then a good rounding can be achieved with a non-trivially small number of hi- 
erarchy rounds. Our second observation is that in several cases, the vectors corresponding 
to a good SDP solution can be shown to have significant mass inside some low-dimensional 
subspace, and that implies a lower bound on their global correlation. Below we elaborate 
on what we mean, using the Max Cut problem (which is a special case of Unique Games) 
as an illustrative example. Our result for Max Cut is worked out in more detail in Section 4. 

Rounding SDP's using a small basis. The SDP solution for Max Cut problem consists 
of a sequence "V = ^i, . . . , f;„ of unit vectors, and the objective value is the expectation of 
(1 - {vi, Vj))/2 over all edges {i,j} in the input graph. Note that in the case that the vectors 
t^i, . . . , i;„ are one dimensional unit vectors (i.e., Vj € {+1|), "V exactly corresponds to a cut 
in the graph, and the objective value measures the fraction of edges cut. Now, suppose that 
you could find r vectors Vi^,. . ., i;,; € "V, whom we'll call the basis vectors, such that every 
other V € has some significant projection p into the span of ii,-, , . . . , u,, . That is, if we let 
P be the projection operator corresponding to this space, then for every v e "V , \\Pv\\2 > p. 
It turns out that in this case, if p is sufficiently close to 1 and the vector solution "V satisfied 
r + 2 rounds of an appropriate SDP hierarchy, then we can round "V to achieve a very good 
cut. The intuition behind this is the following: the constraints of r + 2 hierarchy rounds 
allow us to essentially assume without loss of generality that the vectors Vi^, . . . , Vj^ are one- 
dimensional. That is, after applying an appropriate rotation, we can think of each one of 
them as a vector of the form (±1,0, .. . ,0). Moreover, our assumption implies that every 
other vector in v has a magnitude of at least p in its first coordinate. Now one can show that 
simply rounding each vector to the sign of its first coordinate will result in a ± 1 assignment 
to the vertices corresponding to a good cut. 

Local to global correlation. From the above discussion, our goal of rounding SDP hi- 
erarchies is reduced to finding a small number of basis vectors d,,, . . . , i;,-,. such that every 
(or at least most) other vector in the solution "V has very large projection into their span. 
But, why should such vectors exist? We show that we can assume they exist if the orig- 
inal Max Cut instance has small threshold rank. The latter is a condition that, as men- 
tioned above, holds for many natural families of instances, including the canonical "hard 
instances" that are known to fool the GW algorithm — the noisy sphere and noisy Gaussian 
graphs [FS02, RS09]. The key concept behind our proof is the notion of local vs global cor- 
relations. It is a very well known property of expander graphs that random edges behave 
similarly to pairs of independently chosen vertices with respect to some tests. Specifically, 
if G is an n-vertex expander in the sense that the normalized adjacency matrix Ac's second 
largest eigenvalue is at most e, and / is a bounded function mapping vertices to numbers, 
then we know that E,-y[|/(0 - /(;)p] € (1 ± 0(e)) Ei.y[|/(/) - /(j)^]), where the former 
expectation is over pairs of vertices and the latter is over pairs connected by an edge. In 
other words, expander graphs imply that if / is locally correlated over the edges of an ex- 
pander graph, then it is also globally correlated. In fact, this is easily shown to hold even 
if / maps vertices not into numbers but into vectors — i.e., if . . . , t;„ are unit vectors that 
are locally correlated over the edges of G then they are also globally correlated. Indeed, 
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this property of expanders has been used in the work of [AKK"'"08], who showed that the 
basic SDP program for Unique Games can be successfully rounded if the input graph is an 
expander. 

Our starting point is to observe that a somewhat similar, though much weaker condition 
holds even when the graph has at most, say, r/100 eigenvalues larger than e. In this case 
it's possible to show that,say, if (*) ¥.i^j{vi,Vj)^ > 100 V^, then (**) E,- y<i;,-, i;^)^ ^ 1/r 
(see Section 6). If r is super-constant, the condition (**) does not seem a-priori useful for 
obtaining a good integral solution. Indeed, the standard integrality gap example of Max 
Cut is a graph with fairly small (polylogarithmic) number of large eigenvalues, but no good 
integral solution. However, (**) does imply that we can find at least one vector such 
that ^j{vi^,Vj)^ > 1/r. We can now replace each vector v € 'V with its projection into the 
orthogonal space to Vi^ and continue until we either get stuck or find a basis i;,-, , . . . , y,; such 
that (almost all) vectors y e "V have most of their mass in Span{f;,j, . . . , u,, ), in which case 
we can successfully round the solution. The only way we can get stuck is if at some point 
we get that (*) is violated. Now, in the case of Max Cut, if (*) was violated initially, then 
the value of the SDP would be about 1/2, which is trivial to round by just taking a random 
cut. To show that we can easily round even when (*) is violated at some later point in the 
process, it's useful to switch to the distribution view of SDP hierarchies. 

Distribution view of SDP's. Another, often beneficial way to view SDP hierarchies is 
as providing distribution on integral solutions (see Section 3.2). In this view, for every set 
of r + 2 vertices ii, . . ., i,-, iV+i, 'V+2> the SDP hierarchy provides a distribution X,, , . . . , X,^. 
over ±1. Moreover, we require that distributions on overlapping sets will be consistent, and 
that the for every two variables /, j the expectation E{XiXj\ will equal the inner product 
{vi, Vj) of the corresponding vectors. The challenge in rounding the SDP is that there is 
not necessarily a way to sample simultaneously the random variables X\,...,Xn in some 
consistent way. The projection of a vector v into the span of Vi^,. . ., i;,; turns out to capture 
(an appropriate notion of) the mutual information between the variable X,| and the variables 
X,p . . . ,X,, . Looked at from this viewpoint, our rounding algorithm involves choosing an 
assignment from the distribution for the basis vertices, and conditioning on its value. As 
long as (**) holds, we can find a random variable X, such that conditioning on X; will 
significantly decrease the entropy of the remaining variables. When we get stuck and (*) is 
violated, it means that for a typical edge / ~ j, the random variables X, and Xj are close to 
being statistically independent. This means that just sampling each X,- independently will 
give approximately the same value on a typical constraint. 

Threshold rank vs global correlation. Whenever the graph has small number of large 
eigenvalues, the condition that local correlation imphes global correlation holds. This is use- 
ful to simulate eigenspace enumeration algorithms such as used by [KT07, KollO, ABSIO, 
StelO] since in the case of Unique Games (and other related problems), a good SDP solution 
must be locally well correlated. But the notion of local to global correlation is somewhat 
more general and robust than having small threshold rank. For example, adding ^|n isolated 
vertices to a graph will increase correspondingly the number of eigenvectors with value 1, 
but will actually not change by much the local to global correlation. This captures to a 
certain extent the fact that SDP-based solutions are more robust than the spectral based al- 
gorithms. (A similar example of this phenomenon is that adding a tiny bipartite disjoint 
graph to the input graph makes the smallest eigenvalue become -1, but does not change 
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by much the value of the Goemans-WiUiamson SDR) We hope that this robustness of the 
SDP-based approach will enable further improvements in the future. 

Remark 2.1. Theorem 1.3 considers a different parameter than Theorems 1.1 and 1.2. The 
latter two results consider threshold ranks for a small (i.e., close to 0) threshold t, and 
achieve a very good approximation. In contrast, Theorem 1.1 considers threshold r that 
is close to 1, but only achieve a rough approximation (corresponding to the approxima- 
tion guarantee relevant to the unique games conjecture). This is also manifested in some 
technical differences in the proofs. 

Organization 

We begin by fixing notation and a few formal definitions in the next section. For the pur- 
pose of exposition, we first present an algorithm for Max Cut on low-rank graphs using the 
Lasserre hierarchy in Section 4. Following this, the general algorithm for 2-CSPs on low- 
rank graphs is presented in Section 5. The connection between local and global correlations 
in low-rank graphs that is central to our algorithms, is outlined in Section 6. To implement 
our general approach in a hierarchy weaker than Lasserre hierarchy, we outline an argu- 
ment to obtain low -rank approximation to any set of vectors in Section 7. The final section 
(Section 8) of the paper is devoted to subexponential time algorithm for Unique Games. 

3 Preliminaries 

We will use capital letters X, Y to denote random variables, and lower-case letters to denote 
assignments to these random variables. 

For a real-valued random variable X, let Var[X] denote its variance. In this work, we 
will use random variables taking values over a range {k\ = {\ . . .k]. For a random variable 
X over \k\, and a e \k\, let Xa denote the indicator of the event that X - a. We define the 
variance of X to be, 

Var[X] ^ Var[XJ = 1 - CP(X) . (3.1) 

The collision probability of X is defined as 

CP(X) P |X -X') , 

XX' 

where X' is an independent copy of X (so that the sequence X,X' is i.i.d.). It is easy to see 
that the variance and collision probability are related by, 

CP(X) - 1 - Var[X] . 

For two jointly-distributed random variables X, Y, let {X \ Y - denote the random 
variable X conditioned on the event that Y = y.\i it is clear from the context, we write (X|j/) 
for (X|F = y). We will denote by Ejy) Var[X|F] the following quantity, 

E Var[X|F] - E [Var[(X|F -?/)]]. 
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3.1 Unique Games 



Definition 3.1. An instance of Unique Games consists of a graph G - {V,E), a label set 
[k\ - {1,...,^} and a bijection {k\ — > {k\ for every edge (/, j) € E. A labelling 
£: V ^ [k] is said to satisfy an edge if nij(£(i)) = {{]). The goal is to find a labeling 
{: V ^ \k\ that satisfies the maximum number of edges namely, 

maximize P \nij{e{i)) = {{j)] 



3.2 Local Distributions 

Let V = [n] be a set of vertices and let [k] be a set of labels. An m-local distribution is 
a distribution //^ over the set of assignments \k\^ of the vertices of some set T c V of 
size at most m + 2. (The choice of m + 2 is immaterial but will be convenient later on.) 
A collection of m-local distributions {/irlrcy, |r|<m+2 is consistent if for all T,T' Q V with 
|r|,|r'| < m + 2, the distributions /ij- and jiT' are consistent on their intersection T n T' . 
We sometimes will view these distributions as random variables, hence writing X. for the 
random variable over [k] that is distributed according to the label that assigns to /, 

and refer to a collection Xi , . . . , X„ of m-local random variables. However, we stress that 
these are not necessarily jointly distributed random variables, but rather for any subset of 
at most m + 2 of them, one can find a sample space on which they are jointly distributed. 
For succinctness, we omit the superscript for variables Xi^^^ whenever it is clear from the 
context. For example, we will use {X; | Z5 } is short for the random variable obtained by 
conditioning ^'''^ on the variables {Xf'^^'^^}jes and use P [x,- = Xj | Xj} is short for the 

[0, l]-valued random variable P [xf = xf | xf ^"■•^■'^). 



3.3 Lasserre Hierarchy 

Let U he. sl Unique Games instance with constraint graph G - {V,E), label set {k\ = 
{1, . . . ,^), and bisections {TTijl/jef. An m-round Lasserre solution consists of m-local ran- 
dom variables Xi , . . . , X„ and vectors vs^a for all vertex sets S QV with \S\ ^ m + 2 and all 
local assignments a € [k]^ . A Lasserre solution is feasible if the local random variables are 
consistent with the vectors, in the sense that for all S,T c V and a e [k]^ ,p e [k]^ with 
|5 U r| < m -I- 2, we have 

{vs,a,VTfi) = P{Xs =a,XT=/3}. 



The objective is to maximize the following expression 




An important consequence of the existence of the vectors vs^a is that for every set 5 c V 
with |5| < m and local assignment xs € [k]^ , the matrix jCov(X,a,X,fo | Xc = ;cs)! 

I ■' I i,jeV,a,be[k] 

is positive semidefinite. 

^Strictly speaking, the range of the random variable [Xj | ^"5 ) are random variables with range [k]. For every 
possible value xs for Xs, one obtains a [fc]-valued random variable {Xj | Xs = xg]. 
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4 Warmup - MaxCut Example 



For the sake of exposition, we first present an algorithm for the Max Cut problem on low- 
rank graphs. In the Max Cut problem, the input consists of a graph G = (V, E) and the goal 
is to find a cut 5 U 5 -V of the vertices that maximizes the number of edges crossing, i.e., 
maximizes |£'(5,5)|. 

The Goemans-Williamson SDP relaxation for the problem assigns a unit vector for 
every vertex / € V, so as to maximize the average squared length Eij^EWvi - Vj\\^ of the 
edges. Formally, the SDP relaxation is given by, 

maximize E \\vi - Vj\^ subject to \\v\u = l^i eV 

iJeE 

Stronger SDP relaxations produced by hierarchies such as Sherali-Adams and Lasserre 
hierarchy also yield probability distributions over local assignments. 

More precisely, given a m-round Lasserre SDP solution, it can be associated with a 
set of ni-local random variables Xi, . . . ,X„ taking values in |-1, 1). For an edge its 
contribution to the SDP objective value {\\vi - vjW^) is equal to the probability that the edge 
(/, j) is cut under the distribution of local assignments namely, 

nxi*xj] = \\vi-vjf. 

Consequently, in order to obtain a cut with value close to the SDP objective, it is suffi- 
cient to jointly sample Xi, . . .,X„, such that on every edge (/, j) the distribution of Xj and 
Xj is close to the corresponding local distribution /jfj. However, the variables Xi, . . . ,X„ are 
not jointly distributed, and hence cannot all be sampled together. 

As a first attempt, let us suppose we sample each X, independently from its associated 
marginal If on most edges (/, j), the distribution of the resulting samples Xi, Xj is close to 
fiij, then we are done. On an edge (/, j), the local distribution ju,y is far from the independent 
sampling distribution fii x juj only if the random variables Xi,Xj are correlated. Henceforth, 
these correlations across the edges would be refered to as "local correlations". A natural 
measure for correlations that we will utilize here is defined as Cov(X,-, Xj) = E[X;Xy] - 
E[Z,] E[Xy]. Using this measure, the statistical distance between independent sampling 
(jij X fij) and correlated sampling {fXij) is given by 

WHij-l^iXfijWi < |Cov(X,-,Xy)|. 

(See Lemma 5.1 for a more general version of the above bound). 

On the flip side, the existence of correlations makes the problem of sampling Xi, . . .,X„ 
easier! If two variables Xi,Xj are correlated, then sampling/fixing the value of X,- reduces 
the uncertainty in the value of Xy. More precisely, conditioning on the value of X,- reduces 
the variance of Xy as shown below: 

E Var[Xy|X,] = Var[Xy] - [Cov(X,-, Xy)]' . 

Therefore, if we pick an i € V at random and fix its value then the expected decrease in the 
variance of all the other variables is given by, 

- E Var[X,] = E Cov(X,-,X,)2 • i (-4^ + 77-777^] ■ 
jev i,jev ^' 2\Var[X,] Var[Xy]/ 



E E Var[X,|X,] 
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The above bound is proven in a more general setting in Lemma 5.2. As all random variables 
involved have variance at most 1 , we can rewrite the above expression as, 



E 

ieV,{Xi] 



JEVarlXjlXi] 



■ E Var[X,-] > E \Cov{Xi,Xj)f . 

jeV iJeV 



The decrease in the variance is directly related to the global correlations between random 
pairs of vertices /, j e V. 

Recall that, the failure of independent sampling yields a lower bound on the average 
local correlations on the edges namely, Eij^zeI Cov(X,-,Xy)|. The crucial observation is that 
if the graph G is a good expander in a suitable sense, then these local correlations translate 
in to non-negligible global correlations. Formally, we show the following (in Section 6): 

Lemma 4.1. Let v\, . . . ,Vnbe vectors in the unit ball. Suppose that the vectors are corre- 
lated across the edges of a regular n-vertex graph G, 

B {vi,Vj) >p. 

l]~U 

Then, the global correlation of the vectors is lower bounded by 

.E K^;,-,-j;y>| > Q(p)/rank^n(p)(G) . 

where rank ^p(G) is the number of eigenvalues of adjacency matrix ofG that are larger than 
P- 

As random variables X, arise from the solution to a SDP, the matrix {Co\(Xi, Xj)^ . .^^ is 
positive semidefinite, i.e., there exists vectors m,- such that {uj,Uj) = Cov(Xj,Xj) Wi,j e V. 
Let us consider the vectors Vi = uf^. Suppose the local correlation E,;yg£ \ Cov{Xi,Xj)\ is at 
least £ then we have, 

E {vi,Vi) = E |Cov(X;,X,)|^ > e^ 

iJeE ■' iJeE ^ 

and E,[||f;,|p] < 1. If the graph G is low-rank, then by Lemma 4. 1 we get a lower bound on 
the global correlation of the vectors i;,, namely 

E |Cov(X;,X,)p = E (i;,-,j;,> > a(£2)/rank>^2(G). 

i,jeV iJeV 

Summarizing, if the independent sampling is on average e-far from correlated sampling 
over the edges, then conditioning on the value of a random vertex / e V reduces the average 
variance by £^/rank^£2(G) in expectation. The same argument can now be applied on the 
variables obtained after conditioning on /. In fact, starting with an SDP solution to m-round 
Lasserre hierarchy, the local distributions remain consistent and their covariance matrices 
remain semidefinite as long as we condition on at most m-2 vertices. Observe that average 
variance is at most 1. Hence, after at most rank^£2(G)/e^ steps, the independent sampling 
distribution will be within average distance s from the correlated sampling on the edges. 
The details of this argument are presented in Theorem 5.6. 
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5 General 2-CSP on Low Rank Graphs 



Let 3 be a (general) Max 2-Csp instance with variable set V = [n] and label set [k]. (We 
represent 3 as a distribution over triples (/, j, IT), where /, j € V and IT c [^] x [k] is an 
arbitrary binary predicate. The goal is to find an assignment x e [k]^ that maximizes the 
probability P(,-;;n)~g {ixi,Xj) € n}.) 

For simplicity,^ we will assume that the constraint graph of 3 is regular, i.e., every 
variable / e V appears in the same number of constraints. (Since we allow the constraints 
to be weighted, the precise condition is that the total weight of the constraints incident to a 
vertex is the same for every vertex.) 

Let Xi,...,X„ be r-local random variables with range [k]. We write to denote 
the {0, l)-indicator of the event Xi = a. Notice that {Xia}iev,ae[k] are also m-local random 
variables. 

For two random variables X and X' with the same range, we denote their statistical 
distance, 

II {X} - {X'] 111 2] I P {X = X) - P {X' = ^1 1 . 

X 

Independent Sampling and Pairwise Correlation. The following lemma shows that the 
statistical difference between independent sampling and correlated sampling is explained 
by local correlation. 

Lemma 5.1. For any two vertices i, j € V, 

\\{XiXj} - {XMXj}^ = I CoviXia, Xjt)\ . 

Proof. Under the distribution {XiXj}, the event {Xi = a,Xj = b) has probability EXjaXjt,. 
On the other hand, under the product distribution {X,){Xy), this event has probability 
EX,v, EXyfo. Hence, the difference of these probabilities is equal to E X,v,Xjfo - E X,vj E Xy/, = 

C0ViXia,Xjb). □ 

Conditional Variance and Pairwise Correlation. The following lemma shows that con- 
ditioning on a variable Xj decreases the variance of a variable X; by the correlation of the 
variables X,^ and Xp. 

Lemma 5.2. For any two vertices i, j € V, 

VarX,- - E Var[x,- \Xj]>{ V E Cov{Xia,Xjbf/YarXjh 

Proof. If we condition on Xjt,, the variance of X,„ decreases by Cov(X,„,Xy7,)^/ VarX^/, 
(Lemma C.2). Thus, the variance of X, deceases by XaCov(Xa,Xyfo)^/ VarXy^. Hence, 
there exists bg such that conditioning on Xjhg causes a variance decrement of at least 
J Yja,b Cov{Xja,Xjh)^ / NwXjh. Sincc the variance is non-increasing under conditioning, the 
variance of X; decreases by at least this amount when we condition on Xj. □ 

^If the constraint graph is not regular, all of our results still hold for an appropriate definition of threshold 
rank. 
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Pairwise Correlations and Inner Products. The previous paragraphs were about two 
different notions of pairwise correlation. On the one hand, || {XiXj} - {Xi]{Xj} ||i and on the 
other hand, VarX, - E|x ) Var[X, | Xj]. The following lemma relates these two notions of 
pairwise correlations and shows they can be approximated by inner products of vectors. 

Lemma 5.3. Suppose that the matrix {CoM{Xia,Xjh)^,^^^ ^^^^ is positive semidefinite. Then, 
there exists vectors V[,. . . ,v„ in the unit ball such that for all vertices i, j € V, 

f( 2 \CoviX^a,Xj„)\f<{v!,vj)<\ 2] i(_J_ + _i_)Cov(X,„,X,.,)^ 

Proof. Let {m,^) be the collection of vectors such that {uja,Uj},) = Cov{Xia,Xjh). Note 
that \\uia\\^ - YarXia- Define Vj := M®^/||M,al|. (Here, x denote the unit vector in 

direction x.) The inner product of Vi and vj is equal to 

<^'-'^^> = iZ^^i^Cov(X.,X,.)^ 

a,b 

Using the inequality between arithmetic mean and geometric mean, we have 
(VarXia VarXyfo)"'^^ < (l/VarX,a + 1/ VarXy/,)/2, which implies the desired upper bound 
on the inner product (vj, vj). 

On the other hand, by Cauchy-Schwartz, 

( 2 I Cov(X,,, X4f < 2] Vv^X,.VarX,, • J] -^^^^ Cov(X,,, X,,) . 

a,b a,b a,b 

Since V^^ia ^ 2a EX? = 1 , we have VVar X,v, < for all vertices / € V (by 
Cauchy-Schwartz). Therefore, 

( 2 I Cov(X., X,.)|)^ < k 2] -^^^^ Cov(X., X,,) , 

a,b a,b 

which gives the desired lower bound on the inner product (Vi, vj). It remains to argue that 
the vectors di, . . . , are contained in the unit ball. Since Cov(X,a,X,7,)^ < VarX,a VarX,/,, 
we can upper bound \\vi\\^ < Y,a,b VVar X,^ Var X,^ < 1 (using VVarX,a < V^). □ 

Local Correlation vs Global Correlation on Low-Rank Graphs. The following lemma 
shows that local correlation (correlation across edges of a graph) implies global correla- 
tion (correlation between random vertices) if the graph has low threshold rank. (Proof in 
Section 6.) 

Lemma (Restatement of Lemma 4.1). Let v\, . . .,Vn be vectors in the unit ball. Suppose 
that the vectors are correlated across the edges of a regular n-vertex graph G, 

E {vi,Vj) > p. 

ij~G 

Then, the global correlation of the vectors is lower bounded by 

E \{Vi,Vj)\ > Q(p)/rank^n(p)(G) . 

where rank^p(G) is the number of eigenvalues of adjacency matrix ofG that are larger than 
P- 
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Putting Things Togetlier. The following lemma shows that either independent sampling 
is statistically close to correlated sampling across edges of a graph or the typical variance 
of a vertex decreases non-trivially by conditioning on a random vertex. 

Lemma 5.4. Let G be a regular n-vertex graph and s be the expected statistical distance 
between independent and correlated sampling across the edges ofG, 

£-.E \\{XiXj}-{XMXj}l 

Further, suppose that the matrix ^Cov(X,fl,Xy^)^.^^ is positive semidefinite. Then, con- 
ditioning on a random vertex decreases the variances by 

Var [X; | Xj\ < E^ VarX; - Q(e^//t)/rank^n(£//t)2(G) - 

Proof. Let vi,...,Vn be the vectors constructed in Lemma 5.3. By Lemma 5.3 and 
Lemma 5.1, the local correlation of these vectors is at least 

. mjVi, Vj) > ^ . E ||{X;Xy) - {X;}{Xy)|| J > S^/k^ . 

(The last step also uses Cauchy-Schwartz.) Hence, Lemma 4.1 implies the following lower 
bound on the global correlation of these vectors, 

E \{Vi,Vj)\>£Lis/kf/mnk^^^^/i^^2iG). 

Lemma 5.3 and Lemma 5.2 allows us to relate the expected decrement of the variances to 
the global correlation of the vectors vi, . . .,Vn, 

E [ VarX,- - E Var[X,- | X,]l >k- IE \(vi, Vj)\ , 
/JeyL [Xj] ^ J i,jeV ^ 

which gives the desired upper bound on Eijev ^{Xj\ Var[X, | Xy]. □ 

The following lemma asserts that if the constraint graph has low threshold rank then 
there exists a partial assignment xs to a small set S of vertices such that independent sam- 
pling conditioned on this assignment xs gives almost the same value as correlated sampling 
(without conditioning on the assignment xs)- 

Algorithm 5.5 (Propagation Sampling). 

Input: r-local random variables Xi , . . . , X„ over [k] 

Output: (global) distribution over assignments x € [k]^. 

1. Choose m e { 1, . . . , r) at random. 

2. Sample a random set of "seed vertices" S e V". (Repeated vertices are allowed.) 

3. Sample a assignment xs € [k]^ for S according to its local distribution {Xs}. 

4. For every other vertex / e V \ 5 , sample a label x,- € [k] according to the local 
distribution for S U {/) conditioned on the assignment xs for S . 
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Theorem 5.6. Let X[, . . . ,X„ be r-local random variables and let X'^, . . . ,X'^ be the ran- 
dom variables produced by Algorithm 5.5 on input Xi,... ,X„. Suppose that the matrices 
{Cov{Xia, X jh I Xs = xs))iev,ae[k] are positive semidefinite for every set S Q V with \S\ r 
and local assignment xs e [k]^ ■ Then, if r » 0{k/e'^) • rankfj(g/^)2(G), 



,E {X^Xj}-{X'.X'} <e. 



Proof. Let us define Em, 



E E m \\{XiXj\Xs}-{Xi\Xs}{Xj\Xs}\l . 

To prove the current theorem it is enough to show that Eme[r] ^ £■ For m r, define a 
non-negative potential 0,„ as follows 

:= E E E Var(X,- \ Xs) . 

SeV" [Xs] ieV 

Let me [r]. Suppose Sm > e/2. Then, 



SeV".{Xs\ [ij~G " J Ml J 

Therefore, by Lemma 5.4, 

^^^E^^ J .E Var[X,- | Xs] - .E^Var[X,- | Xs,Xj]\ > e/2 • n{E^ /k)/rmk;,^^,/j,^2(G) . 

In other words, O^+i < 0,„ - Q(£^/^)/rank^Q(e/^)2(G). Since 1 ^ (Di > . . . ^ O,. ^ 0, it 
follows that there are at most 0(k/E^) ■ rank^Q(e/^)2(G) indices m e [r] such that e„, > e/2. 
Therefore, if r » 0{klE^) ■ rank^Q(e/^)2(G), we have 

E Em<£/2 + J- 0{k/E^) ■ rank^jj(£rt)2(G) < £ . 

me[r] 

Finally, by the triangle inequality, 

.E ||ix,x,i-ix;x;)||^ 

- E IK E E E{X,X^ |Xs})-( E E E {X,- | X^KX^- | Xj)) 11 

< E E E E ||{X,X^- I Xs} - {X; I Xs}{Xj \ Xs}\l - E £,„ < e . □ 

ii~Gme[r]SeV'" [Xs]^^ J Ml „,g|-^] 

The following theorem directly implies Theorem 1.1. 

Theorem 5.7. Let £ > and r = 0(k) ■ Tank^Q(g/j^-^2{G)/E^. Suppose that the r-round 
Lasserre value of the Max 2-Csp instance 3 is cr. Then, given an optimal r-round Lasserre 
solution. Algorithm 5.5 (Propagation Sampling) outputs an assignment with expected value 
at least cr - Efor 3 . 

Proof. An optimal r-round Lasserre solution gives rise to r-local random variables 
Xi, . . . ,X„ over [k]. Let Xia be the indicator variable of the event X,- = a. The matrices 
{Cov(X,a,Xyfo I X5 = xs)}ijev,a,he[k] are positive semidefinite for all sets S Q V with |5| < r 
and local assignments xs £ [k]^ . Furthermore, the Lasserre solution satisfies 



E P {(X,X,)€n)^cr. 



(i,i,n)~g) [x.Xj] 
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Let Xj, . . . be the jointly-distributed (global) random variables in Theorem 5.6. By 
Theorem 5.6, we can estimate the expected value of the assignment Xj, . . . , X', as 

E p |(x;,x'.) e n) - E p f(x;,x')€n) 

x[,...,x'n (!,7,n)~3 1 ' ^ ' (!,;,n)~3 ^ ' J > 

> E P {(X,-,X,)en|-i E ||{X;X,}-{X;x'.}|| 

> cr - e . □ 
5.1 Special case of Unique Games 

The following lemma is a version of Lemma 5.3 tailored towards Unique Games. The ad- 
vantage of this version of the lemma is that the bounds are independent of the alphabet 
size k. 

Lemma 5.8. Let X[, . . . ,X„ be r-local random variables over \k\ and let Xia be the indicator 
of the event X,- = a. Suppose that the matrix {Co\ (Xia, Xj^)^,^^^ ^^^^ is positive semidefinite. 
Then, there exists vectors V[, . . . ,Vn in the unit ball such that for all vertices i, j & V and 
permutations n of \k\, 

(2]|cov(x,„x,>(„))|)%(^,,^,->< i(vak + wx;r)Cov(x,,,x,,)2. 

The following theorem immediately implies Theorem 1.2. Let 3 be a Unique Games 
instance with alphabet size k and constraint graph G. 

Theorem 5.9. Let e > and r — k- rankj,Q^£4)(G)/e'^^'^. Suppose that the r-round Lasserre 
value of the Unique Games instance 3 is cr. Then, given an optimal r-round Lasserre 
solution. Algorithm 5.5 (Propagation Sampling) outputs an assignment with expected value 
at least cr - efor 3 . 

Proof Sketch. Let Xi , . . . , X„ be r-local random variables over \k\ from an optimal r-round 
Lasserre solution for 3 . The local variables satisfy 

E P |x, = 7r(X,)} = cr. 

{i,j,n)~-5)[X,Xj]^ ' 

For a permutation n of \k\, we define a modified version of statistical distance, 
II {X,-X,} - {X,){X,) ll/- y I P {Xy-7r(X0)- P {X,- = 7r(X,))| . 

The following analog of Lemma 5.1 holds, 

II {XiX,} - {Xi}{Xj} lU = 2lCov(X-„,Xy,(«))| . 

a 

Using Lemma 5.8, it is straight-forward to prove a better versions of Lemma 5.4 and 
Theorem 5.6 for our modified notion of statistical distance. The conclusion is that for 
r ^ k ■ rank^Q(£4)(G)/£^, Algorithm 5.5 (Propagation Sampling) produces (global) random 
variables Xj , . . . , X,' such that 

E ||{X,Xy}-{X;-X'||U<e. 

(!,;>- 3) 
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Therefore, we can estimate the expected fraction of satisfied constraints as 

E p [x'.^n{x;)]> E p [xj^7T{Xi)]- E \\{XiX j] - {x; - X'.}\1 

> cr - £ . 



6 Local Correlation implies Global Correlation in Low-Rank 
Graphs 

Let G be a regular graph with vertex set V = {I, . . . ,n}. We identify G with its normalized 
adjacency matrix, a symmetric stochastic matrix. Let Ai ^ ... ^ /l„ € [-1,1] be the 
eigenvalues of G in non-increasing order. 

The following lemma shows that a violation of the local vs global correlation condition 
implies that the graph has high threshold rank. 

Lemma 6.1. Suppose there exist vectors vi,...,v„ e R" such that 

. E (t;,-, Vj)>\-E, . E (i;,-, Vjf < 1 , E ||t;;||2 = 1 . 

i]~G i,]eV leV 

Then for all C > 1, \\-\ic)m ^ ^ — C ■ e. In particular, > 1 - 2e. 

Proof. Let X = (xr,i)r,ie[n] be the Gram matrix ({vj, Vj))ij^v represented in the eigenbasis 
of G, so that 

M{Vi,Vj)= Y ArXr,r, E (vi,Vjf^ V .X^,, E = V ^r,r ■ 

ij~G ' ' ' 

re[«] r,.ve[;i] re[n] 

Let m' be the largest index such that /^„,' > 1 - C • e. Notice that the numbers p\ = 
xij, . . . ,Pn = Xn^n form a probability distribution over r € \n\. Let ^ - J]'" j p, be the 
probability of the event r ^ m'. Using Cauchy-Schwarz, we can bound this probability in 
terms of m, 

m' n 
r=l r=l 

On the other hand, we can bound the expectation of Ar with respect to the probability distri- 
bution ,...,„) in terms of this probability q, 

n m' m 

l-S^Y.'^rPr^Y^Pr + il-C-s) ^ ^ I - (I - q)C ■ £ ^ \ - [\ - !^) C ■ S . 

r=l r=l r=m'+l 

It follows that m' > (1 - Vc) • m, which gives the desired conclusion that G has at least 
(1 - Vc) • m eigenvalues A,- > -C • e. □ 

Note that Lemma 4. 1 follows directly from the previous lemma by picking C = 
and observing that ^ijev K'Vi, Vj}\ ^ ^ijev K^i, Vj}\^ since \{vi, Vj)\ < 1 for all /, j eV 

As a converse to Lemma 6.1, the following lemma shows that if a graph has many 
eigenvalues close to 1, then there exist vectors for the vertices of the graph with high local 
correlation and low global correlation. 
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Lemma 6.2. If A,„ > 1 - e, then there exist vectors v\, . . . ,Vn € R"" such that 

E {Vi,Vj) >\-E, 
ij~G 

E {vuvjf = 1, 
E|N|2 = 1. 

Proof. Let f^^\...,p'"^: V — > R be orthononnal eigenf unctions of G with eigenvalue 
larger than 1 - e. Consider vectors vi,...,v„ e R'" satisfying (t;,-, vj) - E,,[„] fl'^f'-'K Since 
the functions /'''^ have norm 1, the typical squared norm of the vectors Vj satisfies 

E|k|p= E ||/'')||2-1. 

leV re[m] 

Since the eigenvalues of the eigenfunctions /^'"^ are larger than 1 - £, we can lower bound 
the local correlation of the vectors y,-, 

E {vi,Vj)= E {f\Gf^)>l-E. 

ij~G re[m] 

Finally, since the function /('"^ are orthonormal, the global correlation of the vectors Vj is 
. E vjf ^ E E fffff^'^ff' = E </'•), /^)>2 = 1 . 

Remark 6.3. The condition that there exist vectors vi, . . . ,v„ e W with 

E {vi,v;) > 1 - e, 

ij~G 

E|rf = 1. 

is equivalent to the condition that there exists a symmetric positive semidefinite matrix 
X € R^^^ such that 

TrGX >\-£, 
TrX^ < 1/m, 
TtX= 1. 



7 On Low Rank Approximations to Sets of Vectors 

Theorem 7.1. Let vi, . . . ,Vn e W be vectors in the unit ball. Then for every e > 0, there 
exists a subset U c [vi,...,Vn} with \U\ < 1/e such that E,_yg[„]||«;,|| ||u;y|| (fZ),-, wy)^ < e, 
where Wi is the projection ofvi to the orthogonal complement of the span ofU. 

The proof of Theorem 7.1 is by an iterative construction. In each iteration, we will use 
the following lemma. 

Lemma 7.2. Let vi,. . . ,Vn e R" Z?^ vectors. Then, there exists a unit vector u € {v\,..., D,,} 
such that the vectors v'^, . . . ,v'^ with v'^ - Vi — (vi, u)u satisfy the following condition, 

E \\v\t^ E \\vit- E \\vMvj\\{vi,Vj?. 

ie[n\ ie[n] i,je[n] 
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Proof. Suppose we pick a random index j € [«] and choose u = vj. In this case, the squared 
norm of the vectors v'. = u,- - {vj, u)u equals 

\\v]f = \\Vit - {Vi,uf = {\ - {vuVj?)\\Vi\^ . 

Hence, we can estimate the expected decrease of the typical squared norms for a random 
vector ue{vi,..., 

E \\vif -EE Wv'Y = E (\\\vit + ^\\vj\\^){vi,Vjf 

> E M\\\vjm,vj? 

It follows that there exists a unit vector u € {v\, . . . ,Vn} such that the vectors v'. = Vi - (vi, u)u 
have the desired property 

E E E M\\\vjm,-Vj? ■ □ 

ie[n\ ;e[«] 

Proof of Theorem 7.1. We can construct the set ?7 in a greedy fashion so as to minimize 
the total squared norm of the vectors wi,...,u)n (the projections of the vectors to the 
orthogonal complement of the span of U). (In fact, we could choose set U randomly.) To 
make the analysis more convenient, we use the following, slightly different construction. 

1. Let v^P = Vi for all / € [n]. 

2. For t from 1 to 1/e, construct vectors u'-'^ € R" and v^l'^^\ . . . , ujf^^^ e R" as follows: 

(a) Using Lemma 7.2, pick a unit vector u^'^ € {v^l\ . . . , vl'^} such that the vectors 

vf'''^-' = v^'^ - {v^l\ M^'-'>M^'' satisfy the condition 

^\\v'r'\?^ Ell.flP- E ||.«||||.flK.« .«>^ 

ie[n] ' ie[n] ' i,je[n] ' J ' J 

Notice that the vectors v^l\ . . . , u^'^ are the projections of the vector vi,. . . ,Vn into the orthog- 
onal complement of the span of the vectors u'-^\. . . , u'-'~^\ Let U be the set of all indices j 
such that M*^'^ = v^'j-' for some t e {1, . . . , 1/e). We can verify that the vectors u^^\ . . ., u'-^^^'' 
are an orthonormal basis of the span of U. Let uji, . . . , u;„ be the projections of the vectors 
vi,. . .,v„ into the orthogonal complement of the span of U (so that w, = v''.^^'^^). Since the 
vectors wi, . . . , u;„ are projections of the vectors u^'^ . . . , d,^,^^ for all f e 1, . . . , 1/e, it follows 
that 

E \\wMwj\\{wi,Wjf ^ E Wi^WWi^^WUpJf)". 

ij€[n] i,je[n] ' J ' J 

Hence, we can bound the typical squared norm of the vectors wt, 

E \\wif< E E \\wi\\\\wj\\{wi,Wjf . 

ie[n] ie[n] ^ i,je[n] ^ ' 

Since the left-hand side is nonnegative and E,g[„]||ii,|p < 1, it follows that 
E,jg[„]||u;;|| llwyll {wi, Wjp- < e , as desired. □ 

For our applications it will sometimes be convenient to associate different subspace 
with subsets IJ of vectors (in Theorem 7.1, we associate the span of vectors in I] with the 
subset V). 
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Theorem 7.3. Let vi, . . . ,v„ € R" be vectors in the unit ball. For every subset U QV, let Qu 
be the projector on some subspace orthogonal to the span of U. (Note that Qu is not neces- 
sarily the projector on the orthogonal complement of the span of U.) Then for every s > 0, 
there exists a subset U c {vi, . . . , y„) with \U\ < 1/e such that E,jg[„]||u;,|| \\wj\\ {wi, ivj)^ < s, 
where Wi = Qu^i- 

Proof. We use the same construction as in the proof of Theorem 7. 1. The only difference is 
that we define uj'"*"'^ = PmoVi (instead of vf'^^^ = vf^ - {vf\u^'^}u'-'^). Here, U'^'''' is the set of 
all indices j such that for some t' < t. The proof is still applies to this modifies 

construction because < Wv'p - (vf\ u^'^)u'-'^\\ (which is the only fact used about these 

vectors). □ 



8 Rounding SDP Solutions to Unique Games 

In this section, we will present a subexponential time algorithm for Unique Games based on 
a SDP hierarchy, namely the simple SDP augmented with Sherali-Adams hierarchy. This 
hierarchy of relaxations weaker than the Lasserre hierarchy was studied in some earlier 
works [RS09, KS09]. Roughly speaking, the mth round relaxation in this hierarchy cor- 
responds to the basic semidefinite program, along with all valid constraints on at most m 
vectors. Formally, the variables in the mth round relaxation for Unique Games consists of 

- A collection of "local distributions" {yUrlrcy, \T\<im- Each distribution pj is over local 
assignments £ [k]^- 

- A set of vectors "V = {via}i €V,a e [k] with k orthogonal vectors for every vertex 

/ € V. 

The constraint of the SDP relaxation ensure that the inner products of the vectors are con- 
sistent with the corresponding local distributions, i.e., for all 5 c V, |5| < m i,j € S and 
a,b & [k], 

P fX; = aAXj = b]= (Via, Vjh) . 

The objective value of the SDP corresponds to minimizing the number of violated con- 
straints, 

||2 



Minimize Eij^E 



\\Via - Vjnij(a)\\ 
aelk] 



8.1 Propagation Rounding 



Let {pt}tcv, |r|<m be a set of consistent local distributions over assignments. For a subset of 
vertices S , the distribution fi^^ over global assignments is sampled as follows: 

1. Sample a assignment xs £ [k]^ for S according to its local distribution /us- 

2. For every other vertex / e V \ 5, sample a label e [k] according to the local 
distribution for S U {/) conditioned on the assignment xs for S . 
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The above procedure will be referred to as propagation rounding and the set 5 of vertices 
will be called the seed vertices. 

The following lemma implies that if the seed vertices S nearly determine the values of a 
set of vertices T, then the assignment output by the propagation rounding has a distribution 
similar to the local distribution juj- that is part of the LP/SDP solution (hence gets close to 
the SDP value). 

Lemma 8.1. For a set S Q V \S\ - m - t, let yu''^ denote the distribution over global 
assignments x e [k]^ output by propagation rounding with S as the seed vertex set. Then, 
for every subset T with \T\ < / we have 

Proof. Consider the following experiment, 

1. Sample a assignment xs e [k]^ for 5 according to its local distribution ju^, 

Xs ~ PS ■ 

2. Sample an assignment yr e [k]^ according to the local distribution for 5 U T condi- 
tioned on the assignment xs for S , 

Ut ~ PSUT I Xs . 

3. For every vertex t € T, sample a label Xt € [k] according to the local distribution for 
S U {/} conditioned on the assignment xs for S , 

Xt ~ Ps,t I xs . 

Clearly the distribution of yj is pr, while the distribution of xj is ju'^ . For any / e T, the 
coordinates Xt and yt are independent samples from ps,t I xs . Therefore we have, 

nxt + yt\xs\ = 1 - Q>^{{xt\xs\) = Var[(Z,|x5)] . 
By a union bound we get, 

PUr t yT\xs\ = Var[(Xf|x5)] . 
Averaging over the different choices of xs , 

Therefore, the statistical distance between the distributions pj and //^ associated with yj 
and Xt is atmost Yakt Var [X^Xj ]. □ 



2^Var[(X,|xs)] 



2^VarK|Xs] 
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8.2 Unique Games on Low Rank Graphs 

Let G be an instance of unique games whose constraint graph G has low threshold rank. Let 
= {via}iev,a<i[k] be an SDP solution for G, and let {fis}scv,\s\=im denote the associated set 
of locla distributions. Let Xi, . . . ,X„ denote the associated m-local random variables. The 
main result of this section shows that there exists a small set of seed vertices fixing whose 
value determines the value of almost every other vertex. Formally, we show the following 

Lemma 8.2. For every integer m, there exists a subset of vertices S c V of size \S\ = k^m 
such that 

E[Var[X,|X5]] <C»(^ 

To this end, we will relate conditioning a random variable X, on a set Xs , to projecting 
the SDP vectors corresponding to the variable X, in to the span of the vectors corresponding 
to X5 . This analogy is formalized in the following lemma. 

Lemma 8.3. Let X[,X2, . . . ,Xr be random variables with range \]c\ with a joint distribution 
/I associated with them. For each i e [r], a e {k\, let Xia be the indicator of the event that 
Xi - a. Let us suppose there exists vectors {via}ie[riae[k] such that 



Via, Vjh) = jX,- = a,Xj = b} = nXiaXjh] . 



Then, for every subset S c [r] we have (1) Var [X,v,|X5] < IIP5 (;,a|p . and (2) Var [Xi\Xs] < 
2^ae[/t]ll^5^i«ll^ • where Ps is the projector o/R" in to the space orthogonal to the span of 
{Vjh]ieSM{k]- 

Proof. Let us suppose Vja = Yjjes,he[k]^ jb^jb + Ps^ia- Define a random variable C5 as 
follows, 

Cs = CjhXjh. 

jeS,bs[k] 

Note that on fixing the values {Xyjyg^, the random variable Cs is fixed. 

By the definition of variance of a real random variable we have the following inequality. 

Var [(X,-, 1x5)] = minE[(X,v, - Cf\xs] < E [(X,-, - CsfUs] . 

C XJxs 

Averaging the above inequality over the settings of xs , we get 

Var[X,-JX5] - E Var({X;Jx5l) < E E [(X,-, - Csf\xs] - E 

(8.1) 

Note that the second moments of the random variables {Xia}ie[r],ae[k^ match with the corre- 
sponding inner products of vectors {via}ie[r\ae[k\- Hence, 



jeSMlk] 



E 

M 



{Xia - ^ CjhXjhf 
jeS,be[k] 



\Via- Yj CjtVjtf = \\PsViaf. (8.2) 

jeS,be[k] 



The claim (1) follows from (8.1) and (8.2). 

The claim (2) follows from (1) and the definition of variance of a random variable taking 
values over [k]. □ 
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Proof of Lemma 8.2. For a subset T Q'V - {via}iev,ae[k], let be the set of vertices associ- 
ated with it namely, 

St = {i^V\3b e \k\,Vib € T], 

Let Qt denote the projector on to the subspace orthogonal to span of {via\i € St,^ € [k]}. 
In particular, Qj is a projector on to a subspace orthogonal to T for all T C'V. 

Apply Theorem 7.3 on the set of vectors "V = {vja}iev,ae[k] with the projectors Qj for a 
subset T Q 'V. Theorem 7.3 implies that there exists a choice of 7 c "V of size \T\ = k^m 
such that if Uia = QrVta then, 

- - 9 1 

E WUiaW WUjbW (Uia, UjhY < 75— (8-3) 
i,i,a,b k^m 

Let 5 r c V be the vertex set associated with the set of vectors T. We will drop the subscript 
and refer to this set as S . Let P5 denote the projector in to the space orthogonal to the span 

of vectors {Via]ieS,aem- 

Let us fix some notation: m,q = PsVia , m,o - IImiqIIm^'^ ® Via. As for each / € V, the 
vectors {via]ae[k] are orthogonal to each other, the set of vectors {uia}ae[k] are orthogonal 

def 

to each other too. For each vertex / € V, we can associate a vector Ui defined as Ui = 

Yiae[k] ^ia- 

From (8.3), we get the following bound on the average correlation of vectors 

{Uia}ieV,ae[k], 

- - 7 1 

^{Uia,Ujh)< B \\Uia\\\\Ujb\\{Uia,Ujh) < -T— (8.4) 
i,j,a,b i,j,a,b Km 

Using the low global correlation between vectors {uia\iev,ae[k\ ((8.3)), we bound the global 
correlation between the vectors |?7,),gv as shown below, 

E \{Ui,Uj)\^ E Y {Uia,Ujh) ^ k^ E {Uia,Ujh)<—- 
ij'evL J iJeV ^ i,jeV,a,be[k] m 

a,be[k] 

From Lemma 8.5, the low global correlation of vectors {Ui}iev implies that their squared 
length is small, i.e., 

E[l|f/,lP].o(£). 

Notice that 

ae[k] ae[k] 



By Lemma 8.3, this implies that 



E Yar[Xi\Xs] <0{-^ 

ieV 



□ 



Lemma 8.4 (High Local Correlation). IfV is an SDP solution to unique games with value 
I - rj, i.e.. 



ae[k\ 



then 



E ||[/, -[/,||2< 377. 

(<j)e£ 
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We defer the proof to Appendix B. 



Lemma 8.5 (Local Correlation — > Global Correlation), If the vectors {Ui}iev satisfy, 

4j] 



then the average correlation among the vectors {?7;),gy is at least ^/m, i.e., 

E (Ui,Uj)>-. 

iJeV m 

Proof. By Lemma 8.4, the vectors [Ui) satisfy 



This implies that. 



E \\Ui-Uj\\^ < 37]. 

{iJ)eE 



E {Ui,Uj)>n\Uit -Iri. 

{iJ)€E I 2 



Let E,||[/,|p = C ^ 4t]/A,„. Normalize the vectors Uj so as to make their average squared 
length equal to 1. The resulting vectors have correlation at least (1 - ij/C) ^ 1 - A,„/2. By 
Lemma 6.1, this implies that E,-ygy(?7,-, Uj)^ > ^. Since ||?7,|| < 1 for all / € V , we get 

E <[/;,[/,•> > E {UuU,) >-. 

□ 



8.3 Wrapping Up 

Our main result about Unique Games (Theorem 1.3) is a direct consequence of Theorem 8.6 
and Theorem 8.7 presented here. 

Theorem 8.6. For every positive integer m, there exists an algorithm running in time n^^'"'^^^ 
that given a unique games instance T over alphabet \k\ with value 1 - 77, finds a labelling 
satisfying 1 - 0{j-) fraction of the edges. Here A,„ is the m'^ smallest eigen value of the 
Laplacian of the constraint graph F. 

Proof. The algorithm proceeds by solving the fc^m + 2-round Lasserre SDP for the given 
instance. Starting with the SDP solution, the algorithm runs the propagation rounding algo- 
rithm starting from every possible seed set S of size \S \ = k^m. 
By Lemma 8.2, there exists one such set S for which we have. 



E[VarK|X5]] <0 




(8.5) 



Let ju' denote the distribution over global assignments output by the propagation round- 
ing scheme. For an edge (/, j), let jijj denote the local distribution over \k\^ suggested by 
the SDP solution. From Lemma 8.1, the statistical distance between u,-, and u!^. is at most 
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Hence for every edge (/, j), 



lP[xi - jTijixj)] > = n,-j{xj)] - Ywc[Xi\Xs] - Yar[Xj\Xs] . 



Averaging over all the edges we see that, 



E P[xi=nij{xj)] 



iJeE IS 



E 



E 



P[x,- 



7r,7(xy)] 



V\Xi = TTijiXj)] 

A'../ 



- E Yar[Xi\Xs] - E Yar[X j\Xs] 

iJeE iJeE 



2 E Y£Lr[X,\Xs] 

ieV 



> Val(^) -2 E VarKIXs] 

r'eV 

where ValCV) is the SDP objective value of the solution "V. Along with (8.5), this implies 
that the algorithm on the choice of the appropriate seed set S would find a solution with 
value at least 1 - 77 - O(t-). □ 

Theorem 8.7. There exists an algorithm that given a Unique Games instance Y with vertex 
set {n\, label set \k\ and optimal value 1 - e, finds an assignment with value at least 1/2 by 
rounding an ■ n'^^^^'^ ^ -round Lasserre solution. 

Proof sketch. The proof follows by combining our propagation rounding and the decom- 
position theorem of [ABSIO]. The latter result allows us to partition the input graph into 
disjoint components each with l-ce rank at most n^^^^^^^ by removing at most 0.01 fraction 
of the edges in our input graph. An SDP solution for the input graph induces a solution for 
each of the components, and hence we can round the solution for each component separately 
using propagation rounding. □ 



Conclusions 

We have shown that 71'^'^^"'^ rounds of an SDP hierarchy suffice for solving the Unique 
Games problem on (1 - e)-satisfiable instances. The best lower bound known for the hi- 
erarchy we used is loglog"<'>« [RS09, KS09], and so a natural question, with obvious 
relevance to the unique games conjecture, is which bound is closer to the truth. The fact 
that our algorithm's running time for r rounds is only 2'^^''^ (as opposed to n'^^''^), challenges 
the interpretation of lower bounds in the range [w(l), (log «)] as corresponding to super- 
polynomial running time, and so provides further motivation to the question of whether the 
current hierarchy lower bounds can be improved further. 

With the exception of the Small-Set Expansion problem, we do not know how to trans- 
late algorithms for Unique Games into other computational problems. We hope that our 
ideas will help in combining the [ABS 10] subexponential algorithm for Unique Games with 
SDP-based method to make progress on other Unique GAMES-hard computational problems. 
Indeed, Arora and Ge (personal communication) recently used the ideas of this work to 
obtain improved algorithms for 3-coloring on some interesting families of instances. A 
concrete open question along similar lines is whether one can get an algorithm for the Max 
Cut problem with approximation factor e better than the factor of the Goemans- Williamson 
algorithm that runs in time exp(?iP°'^^'^^). 

For general 2-CSPs, we know that some instances will require a large number of hier- 
archy rounds, but it's interesting to see whether there is any clean characterization of the 
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instances on which SDP hierarchies do well, encompassing, say, both low threshold rank 
graphs and planar graphs. Another interesting question is to find the right generalization of 
the low threshold rank condition to ^-CSPs for k > 2. 
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A Faster Algorithms for SDP hierarchies 

In this section, we argue that our rounding algorithm also works with weaker SDP hierar- 
chies. We will show that for these weaker hierarchies, a near-optimal m-round solution can 
be computed in time 2'^^''^ poly(«). Due to the equivalence of optimization and separation, 
it is enough to describe a separation oracle with running time 2'^^''^ poly(?i). Given a collec- 
tion of vectors {vja], the separation oracle either has to output a good assignment or it has to 
output a valid linear constraint violated by the inner products of the input vectors. 

We argue that such a separation oracle can easily be extracted from our rounding algo- 
rithm. Our rounding algorithm for Unique Games first selects a set S of roughly m vertices, 
then samples an assignment xs for these vertices, and finally samples labels x, for the re- 
maining vertices from the local distributions conditioned on the event xs . The selection of 
the set S depends only on the SDP vectors {via} but not on the local distributions (which are 
not known to the separation oracle). 

Hence, given vectors {via], our separation oracle can simply work as follows: 

1. Select a vertex subset using Theorem 7.1 based on the given vectors {vja}. 

2. Using linear programming, find local distributions that are as consistent as possible 
with the inner products of the vectors If these local distributions match the inner 
products sufficiently closely, then our propagation rounding algorithm will succeed. 
On the other hand, if the local distributions do not match the inner products closely 
enough, then we can find a valid linear constraints that is violated by the inner product 
of the given vectors. (This separating linear constraint can be obtained from the dual 
solution of the linear program that was used to find the best local distributions.) 
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B Omitted proofs from Section 5 and Section 8 

This appendix contains the proofs for some omitted proofs. 

Lemma B.l (High Local Correlation). (Lemma 8.4 restated) IfV is an SDP solution to 
unique games with value I - rj, i.e., 



ae[k] 

then 

E WUi-Ujf <3r], (B.l) 

Proof. Observe that the vectors M,a are projections of Via and projections shrinks distances, 
which implies that the vectors are correlated across constraints of the Unique Games 
instance, 

E y \\Ui, - Uj„,^^a)f < E y WVia - Vp,^^a)\\^ < T] . (B.2) 

Let ilia = WuiaWuf^ ® Via- Noticc that {uia,Uib) - for distinct labels a,b e {k\. We claim 
that the uta vectors are also correlated across constraints of the Unique Games instance. 



Claim B.2. 



E y\\uia-Uj„(a)f^3r] (B.3) 

ae[k] 



Proof. The following identity relates the distance of vectors to differences in their norms 
and the distance of the corresponding unit vectors, ||x - i/|p = (||x|| - ||i/||)^ + ll-^i^ll 11-^ - ^iP • 
Since ||m,„|| = \\uia\\, we get 



2 



\\Uia - iijn,jia)f ^ - + ll";;r„(a)ll ufa ® Via - "^^(fl) ® '^^./o) 

Since ||xi ® X2 - (8 ^2lP < ll-^i - ^ilP + 11-^2 - ^2lP, we can further upper bound 

\\Uia - iijn,j(a)f < (ll"mll - + ll";;r,/fl)ll ^2 \\uia - Uj7Tij{a)f + pia - Vjn,j(a)f'^ 

< 2\\Uia - My;r,j(a)lP + \\Via - Vj„,j(a)\?' ■ 

(In the last step, we again used the identity \\x - y\\^ - {\\x\\ - \\y\\)^ + \\x\\ \\y\\ \\x - y\\^ . and 
the fact that \\uia\\ \\u jjr_^(a)\\ < WviaW \\v jK,j(a)\l) By averaging over the label set and the edges 
of the graph, it follows as claimed that 



E y 11%, - Uj„ (a)f < 377. 



iieE 



To finish the proof of the lemma, we relate the distances \\Ui - Uj\\^ across an edge 
ij € E to distances of the vectors across the constraint 7r,j, 

{Ui,Uj) ^ ^{Uia,Ujt) 

a,b 

> ^(m,q, Ujmjia)) (using non-negativity of involved inner products) 
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= Xi 5 {W^iaf + \\'ijn,j(a)f - \\Uia - «;;r,/«)ll^) 

a 

a 

Rearranging gives \\Ui - Uj\\^ < I^cMa - u j7r,j(a)\\^ ■ Hence, using Claim B.2, 

ijeE ijeE ^ 



□ 



Lemma (Restatement of Lemma 5.8). Let Xi,...,Xn be r-local random variables over 
\k\ and let X,^ be the indicator of the event Xi - a. Suppose that the matrix 
{Cov{Xia,Xjhj^ is positive semidefinite. Then, there exists vectors V\, . . . ,v„ in the 

unit ball such that for all vertices i, 7 e V and permutations n of [k], 

( 2 |Cov(X„,X,-.(,))|)' < {v,,vj) < 2] + _J_)Cov(X„,X,,)2. 

ae[k] (aMMk]^ 

Proof. Let {uja) be the collection of vectors such that {uia,Uji,) = Cov(Xja,Xjh). Note that 
- VarXjQ. Let Via - + EX,a vq, where vq is a unit vector orthogonal to all vectors 
Uia- Define Vi = XolNiflll^^^ ® (Here, x denotes the unit vector in direction x.) Let us 
first lower bound the inner product of Vi and vj, 

( ^ \Cov{Xia,Xjjr(a))\f ^ [Y)\'^ia\\\\Ujn(a)\\\(Uia,Ujr:(a))\f 
ae[k] a 

-fVll,, Mil,, II I/,-, ,-, \l / ll"'"!! / ll»Mllll»;-;r(a)ll l 

^ X"' II /- - x2ll"iallll";>(<!)l 

< 2_j\Wa\\ \\Uj7:(a)\\ ' {Uia, Uj„(a)) ||,^^(^,|| 



^ll"iflll \\Ujjt(a) 



J 

a)\\ 

^2 



ll'^iall ll^_/7r(fl) 



(using Cauchy-Schwarz) 
\2 



< I^^IIMmII ll"i;r(a)ll " \{ll-ia,U j „(a)){Via,V jn{a))\^ 

(using {Via,Vj„(a)) > (Uia,Uj„(a)) and ^JViaW \\Vj„(a)\\ < 1) 

< ^IIMmII • {Uia,Ujn(a))^{Via,Vj„(a)f' 

a 

(using Cauchy-Schwarz and ||Mj;r(a)|| < 1) 

a 

< {Vi, Vj) . 

On the other hand, we can upper bound the inner product of i>, and Vj, 

{Vi,Vj) = Y^WUiaW WUjtW ■ (Uia, Ujt)^{Via,Vjbf 
a,b 



29 



a,b 



a,b 

Finally, the vectors . . . , are in the unit ball, 

= ^||M,-a|| WUibW ■ {Uia,Uibf{Via,Vibf = ^llMfalP < 1 • 
a,b a 

Here, we are using the fact that (via, vib) - for all distinct a,b e [k]. □ 

C Facts about Variance 

Lemma C.l. Let X and Y be jointly-distributed random variables. Assume that Y has finite 
range.Let Z be the orthogonal projection of the random variable X onto the subspace of 
functions of the random variable Y. Then, 

E Var[X| Y] ^EX^-EZ^ 

Proof. By construction Z is a function f(Y) of the random variable Y and X-Z is orthogonal 
to all functions of the variable Y. Hence, E[X \ Y = y] = f{y). Therefore, the expected 
variance of [X \ Y] is 

E Var [X I 7] = EX^ - E (e[Z | Y]) 

which gives the desired identity using Z = f{Y). □ 

Lemma C.2. Let X and Y be as in the previous lemma. Suppose the range of Y has cardi- 
nality 2. Then, 

E Var[X I 7] = VarX - Cov(X, Yf/ Var(7) . 

Proof. Without loss of generality, we may assume that E X = E 7 = and E 7^ = 1 . Then, 
the set of random variables {1, 7) is an orthonormal basis for the subspace of functions of 7. 
Let p = EZ7. Then, pY is the orthogonal projection of X to the subspace of function of 7. 
(Here, we use the assumption EZ = 0.) Hence, using the previous lemma, 

E Var[Z | 7] = EZ^ - E0o7)^ = EZ^ - p^ , 
which is the desired identity because EZ^ = VarZ and p^ - Cov(Z, 7)^/ Var 7. □ 
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